Sigma point policy iteration
نویسندگان
چکیده
In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear value function approximation. These algorithms rely on policy-dependent expectations of the transition and reward functions, which require all experience to be remembered and iterated over for each new policy evaluated. We propose to summarize experience with a compact policy-independent Gaussian model. We show how this policyindependent model can be transformed into a policy-dependent form and used to perform policy evaluation. Because closed-form transformations are rarely available, we introduce an efficient sigma point approximation. We show that the resulting Sigma-Point Policy Iteration algorithm (SPPI) is mathematically equivalent to LSPI for tabular representations and empirically demonstrate comparable performance for approximate representations. However, the experience does not need to be saved or replayed, meaning that for even moderate amounts of experience, SPPI is an order of magnitude faster than LSPI.
منابع مشابه
Point-Based Policy Iteration
We describe a point-based policy iteration (PBPI) algorithm for infinite-horizon POMDPs. PBPI replaces the exact policy improvement step of Hansen’s policy iteration with point-based value iteration (PBVI). Despite being an approximate algorithm, PBPI is monotonic: At each iteration before convergence, PBPI produces a policy for which the values increase for at least one of a finite set of init...
متن کاملNew three-step iteration process and fixed point approximation in Banach spaces
In this paper we propose a new iteration process, called the $K^{ast }$ iteration process, for approximation of fixed points. We show that our iteration process is faster than the existing well-known iteration processes using numerical examples. Stability of the $K^{ast}$ iteration process is also discussed. Finally we prove some weak and strong convergence theorems for Suzuki ge...
متن کاملSolving time-fractional chemical engineering equations by modified variational iteration method as fixed point iteration method
The variational iteration method(VIM) was extended to find approximate solutions of fractional chemical engineering equations. The Lagrange multipliers of the VIM were not identified explicitly. In this paper we improve the VIM by using concept of fixed point iteration method. Then this method was implemented for solving system of the time fractional chemical engineering equations. The ob...
متن کاملPolicy Iteration in Finite Templates Domain
We prove in this paper that policy iteration can be generally defined in finite domain of templates using Lagrange duality. Such policy iteration algorithm converges to a fixed point when for very simple technique condition holds. This fixed point furnishes a safe over-approximation of the set of reachable values taken by the variables of a program. We prove also that policy iteration can be ea...
متن کاملCombined Fixed Point and Policy Iteration for Hjb Equations in Finance
Implicit methods for Hamilton Jacobi Bellman (HJB) partial differential equations give rise to highly nonlinear discretized algebraic equations. The classic policy iteration approach may not be efficient in many circumstances. In this article, we derive sufficient conditions to ensure convergence of a combined fixed point-policy iteration scheme for solution of the discretized equations. Numeri...
متن کامل